This guide shares strategies and tactics for getting better results from GPTs. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you.
本指南分享了提高GPT效果的策略和技巧。这里描述的方法有时可以组合使用以获得更好的效果。我们鼓励尝试不同的方法,找到最适合您的方法。

Some of the examples demonstrated here currently work only with our most capable model, gpt-4. If you don't yet have access to gpt-4 consider joining the waitlist. In general, if you find that a GPT model fails at a task and a more capable model is available, it's often worth trying again with the more capable model.
这里展示的一些示例目前仅适用于我们最强大的模型 gpt-4 。如果您还没有 gpt-4 的访问权限,请考虑加入等待列表。通常情况下,如果您发现GPT模型在某项任务上失败,而更强大的模型可用,则通常值得尝试使用更强大的模型再次尝试。

GPTs can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less GPTs have to guess at what you want, the more likely you’ll get it.
GPT不能读取你的思想。如果输出太长,请要求简短的回复。如果输出太简单,请要求专家级的写作。如果你不喜欢格式,请展示你想要看到的格式。GPT越少猜测你想要什么,你得到它的可能性就越大。

Tactics: 策略:

GPTs can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to GPTs can help in answering with fewer fabrications.
GPT可以自信地编造虚假答案,特别是在涉及深奥话题、引用和URL时。就像一张笔记纸可以帮助学生在考试中表现更好一样,向GPT提供参考文本可以帮助更少地编造答案。

Tactics: 策略:

Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to GPTs. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.
就像在软件工程中将复杂系统分解为一组模块化组件一样,将任务提交给GPT也是同样的好习惯。复杂任务往往比简单任务具有更高的错误率。此外,复杂任务通常可以重新定义为一系列简单任务的工作流程,其中早期任务的输出用于构建后续任务的输入。

Tactics: 策略:

If asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, GPTs make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a chain of reasoning before an answer can help GPTs reason their way toward correct answers more reliably.
如果让你计算17乘以28,你可能不会立刻知道答案,但是你可以花时间算出来。同样地,当GPT试图立即回答问题时,会出现更多的推理错误,而不是花时间计算出答案。在回答问题之前要求给出推理链,可以帮助GPT更可靠地推理出正确的答案。

Tactics: 策略:

Compensate for the weaknesses of GPTs by feeding them the outputs of other tools. For example, a text retrieval system can tell GPTs about relevant documents. A code execution engine can help GPTs do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a GPT, offload it to get the best of both.
通过向GPT提供其他工具的输出来弥补其弱点。例如,文本检索系统可以告诉GPT相关文档的信息。代码执行引擎可以帮助GPT进行数学计算和代码运行。如果一个任务可以通过工具而不是GPT更可靠或更高效地完成,那么将其卸载以获得最佳效果。

Tactics: 策略:

Improving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an "eval").
如果您能够测量性能,那么提高性能就更容易了。在某些情况下,对提示进行修改可能会在一些孤立的示例上实现更好的性能,但会导致更具代表性的示例集上的整体性能变差。因此,为了确保更改对性能的净影响是积极的,可能需要定义一个全面的测试套件(也称为“评估”)。

Tactic: 策略:

Each of the strategies listed above can be instantiated with specific tactics. These tactics are meant to provide ideas for things to try. They are by no means fully comprehensive, and you should feel free to try creative ideas not represented here.
以上列出的每种策略都可以通过具体的战术来实现。这些战术旨在提供尝试的想法。它们并不是完全全面的,您应该随意尝试这里没有涵盖的创意想法。

In order to get a highly relevant response, make sure that requests provide any important details or context. Otherwise you are leaving it up to the model to guess what you mean.
为了获得高度相关的回复,请确保请求提供任何重要的细节或背景。否则,您将让模型猜测您的意思。

Worse 更糟的。Better 更好
How do I add numbers in Excel?
我该如何在Excel中添加数字?
How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called "Total".
我该如何在Excel中累加一行美元金额?我想自动为整个工作表中的所有行执行此操作,所有总计都会出现在名为“总计”的列的右侧。
Who’s president? 谁是总统?Who was the president of Mexico in 2021, and how frequently are elections held?
2021年墨西哥的总统是谁?选举有多频繁举行?
Write code to calculate the Fibonacci sequence.
编写代码计算斐波那契数列。
Write a TypeScript function to efficiently calculate the Fibonacci sequence. Comment the code liberally to explain what each piece does and why it's written that way.
编写一个TypeScript函数来高效地计算斐波那契数列。请在代码中添加注释,解释每个部分的作用以及为什么要这样编写。
Summarize the meeting notes.
总结会议记录。
Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any.
请用一段话总结会议记录。然后,编写一个标记列表,列出每个发言者及其要点。最后,列出发言者建议的下一步行动或行动项目(如果有的话)。

The system message can be used to specify the persona used by the model in its replies.
系统消息可用于指定模型在回复中使用的人设。

SYSTEM 系统
When I ask for help to write something, you will reply with a document that contains at least one joke or playful comment in every paragraph.
当我请求帮助写作时,你会回复一个文档,其中每个段落至少包含一个笑话或俏皮的评论。
USER 把文本翻译成简体中文:
Write a thank you note to my steel bolt vendor for getting the delivery in on time and in short notice. This made it possible for us to deliver an important order.
写一封感谢信给我的钢螺栓供应商,感谢他们在短时间内准时交货。这使我们能够交付一份重要的订单。
Open in Playground 在游乐场开放

Delimiters like triple quotation marks, XML tags, section titles, etc. can help demarcate sections of text to be treated differently.
像三重引号、XML标签、章节标题等分隔符可以帮助划分文本的不同部分以便进行不同的处理。

USER 把文本翻译成简体中文:
Summarize the text delimited by triple quotes with a haiku. """insert text here"""
请将文本翻译成简体中文: 用俳句总结三引号之间的文本。 请提供需要翻译的文本。
Open in Playground 在游乐场开放
SYSTEM 系统
You will be provided with a pair of articles (delimited with XML tags) about the same topic. First summarize the arguments of each article. Then indicate which of them makes a better argument and explain why.
您将会收到一对关于同一主题的文章(用XML标签分隔)。首先总结每篇文章的论点。然后指出哪篇文章提出了更好的论点,并解释原因。
USER 把文本翻译成简体中文:
<article> insert first article here </article> <article> insert second article here </article>
在此插入第一篇文章
在此插入第二篇文章
Open in Playground 在游乐场开放
SYSTEM 系统
You will be provided with a thesis abstract and a suggested title for it. The thesis title should give the reader a good idea of the topic of the thesis but should also be eye-catching. If the title does not meet these criteria, suggest 5 alternatives.
您将获得一篇论文摘要和一个建议的标题。论文标题应该让读者对论文的主题有一个好的了解,同时也应该引人注目。如果标题不符合这些标准,请提出5个备选方案。
USER 把文本翻译成简体中文:
Abstract: insert abstract here Title: insert title here
摘要:在此插入摘要。 标题:在此处插入标题
Open in Playground 在游乐场开放

For straightforward tasks such as these, using delimiters might not make a difference in the output quality. However, the more complex a task is the more important it is to disambiguate task details. Don’t make GPTs work to understand exactly what you are asking of them.
对于这样简单的任务,使用分隔符可能不会对输出质量产生影响。然而,任务越复杂,消除歧义就越重要。不要让GPTs费力去理解你要求的具体内容。

Some tasks are best specified as a sequence of steps. Writing the steps out explicitly can make it easier for the model to follow them.
有些任务最好被规定为一系列步骤。明确地写出这些步骤可以让模型更容易地遵循它们。

SYSTEM 系统
Use the following step-by-step instructions to respond to user inputs. Step 1 - The user will provide you with text in triple quotes. Summarize this text in one sentence with a prefix that says "Summary: ". Step 2 - Translate the summary from Step 1 into Spanish, with a prefix that says "Translation: ".
请按照以下逐步说明来响应用户输入。 步骤1 - 用户将提供三引号中的文本。使用以“摘要:”为前缀的一句话总结此文本。 步骤2 - 将步骤1中的摘要翻译成西班牙语,并在前面加上“翻译:”。
USER 把文本翻译成简体中文:
"""insert text here""" 请在此处插入文本。
Open in Playground 在游乐场开放

Providing general instructions that apply to all examples is generally more efficient than demonstrating all permutations of a task by example, but in some cases providing examples may be easier. For example, if you intend for the model to copy a particular style of responding to user queries which is difficult to describe explicitly. This is known as "few-shot" prompting.
通常,提供适用于所有示例的一般说明比通过示例演示任务的所有排列组合更有效,但在某些情况下,提供示例可能更容易。例如,如果您打算让模型复制一种难以明确描述的特定用户查询响应风格。这被称为“少样本”提示。

SYSTEM 系统
Answer in a consistent style.
保持一致的风格回答。
USER 把文本翻译成简体中文:
Teach me about patience.
教我耐心。
ASSISTANT 助手
The river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread.
刻出最深峡谷的河流源于一处不起眼的泉眼;最宏伟的交响乐从一支单独的音符开始;最复杂的挂毯由一根孤独的线缕起。
USER 把文本翻译成简体中文:
Teach me about the ocean.
教我关于海洋的知识。
Open in Playground 在游乐场开放

You can ask the model to produce outputs that are of a given target length. The targeted output length can be specified in terms of the count of words, sentences, paragraphs, bullet points, etc. Note however that instructing the model to generate a specific number of words does not work with high precision. The model can more reliably generate outputs with a specific number of paragraphs or bullet points.
您可以要求模型生成指定目标长度的输出。目标输出长度可以根据单词、句子、段落、项目符号等进行指定。但请注意,指示模型生成特定数量的单词并不具有高精度。模型可以更可靠地生成具有特定段落数或项目符号的输出。

USER 把文本翻译成简体中文:
Summarize the text delimited by triple quotes in about 50 words. """insert text here"""
请将三引号包含的文本总结成约50个字。 请提供需要翻译的文本。
Open in Playground 在游乐场开放
USER 把文本翻译成简体中文:
Summarize the text delimited by triple quotes in 2 paragraphs. """insert text here"""
请提供需要翻译的文本。
Open in Playground 在游乐场开放
USER 把文本翻译成简体中文:
Summarize the text delimited by triple quotes in 3 bullet points. """insert text here"""
将三重引号包含的文本总结为3个要点: - 文本被三重引号包含,表示它是一个多行字符串。 - 总结文本需要提取出其中的关键信息,概括出主要内容。 - 通过将文本分解为几个部分,可以更好地理解和总结其含义。 请在此处插入文本。
Open in Playground 在游乐场开放

If we can provide a model with trusted information that is relevant to the current query, then we can instruct the model to use the provided information to compose its answer.
如果我们能够为模型提供与当前查询相关的可信信息,那么我们就可以指示模型使用提供的信息来组成其答案。

SYSTEM 系统
Use the provided articles delimited by triple quotes to answer questions. If the answer cannot be found in the articles, write "I could not find an answer."
使用三重引号分隔的提供的文章来回答问题。如果在文章中找不到答案,请写“我找不到答案”。
USER 把文本翻译成简体中文:
<insert articles, each delimited by triple quotes> Question: <insert question here>
请提供要翻译的文章,每篇文章之间用三个引号分隔。 问题: <插入问题内容>
Open in Playground 在游乐场开放

Given that GPTs have limited context windows, in order to apply this tactic we need some way to dynamically lookup information that is relevant to the question being asked. Embeddings can be used to implement efficient knowledge retrieval. See the tactic "Use embeddings-based search to implement efficient knowledge retrieval" for more details on how to implement this.
鉴于GPT的上下文窗口有限,为了应用这种策略,我们需要一些动态查找与所提问问题相关信息的方法。嵌入可以用于实现高效的知识检索。有关如何实现此操作的更多详细信息,请参见“使用基于嵌入的搜索实现高效的知识检索”策略。

If the input has been supplemented with relevant knowledge, it's straightforward to request that the model add citations to its answers by referencing passages from provided documents. Note that citations in the output can then be verified programmatically by string matching within the provided documents.
如果输入已经补充了相关知识,那么请求模型通过引用提供的文档中的段落来为其答案添加引用就很简单了。请注意,输出中的引用可以通过在提供的文档中进行字符串匹配来进行程序验证。

SYSTEM 系统
You will be provided with a document delimited by triple quotes and a question. Your task is to answer the question using only the provided document and to cite the passage(s) of the document used to answer the question. If the document does not contain the information needed to answer this question then simply write: "Insufficient information." If an answer to the question is provided, it must be annotated with a citation. Use the following format for to cite relevant passages ({"citation": …}).
您将获得一个由三个引号分隔的文档和一个问题。您的任务是仅使用提供的文档回答问题,并引用用于回答问题的文档段落。如果文档不包含回答此问题所需的信息,则只需写下:“信息不足”。如果提供了问题的答案,则必须使用引文进行注释。使用以下格式引用相关段落({"citation": …})。
USER 把文本翻译成简体中文:
"""<insert document here>""" Question: <insert question here>
请在此处插入文档。 问题: <插入问题内容>
Open in Playground 在游乐场开放

For tasks in which lots of independent sets of instructions are needed to handle different cases, it can be beneficial to first classify the type of query and to use that classification to determine which instructions are needed. This can be achieved by defining fixed categories and hardcoding instructions that are relevant for handling tasks in a given category. This process can also be applied recursively to decompose a task into a sequence of stages. The advantage of this approach is that each query will contain only those instructions that are required to perform the next stage of a task which can result in lower error rates compared to using a single query to perform the whole task. This can also result in lower costs since larger prompts cost more to run (see pricing information).
对于需要处理不同情况的大量独立指令集的任务,首先对查询类型进行分类并使用该分类确定所需的指令可能是有益的。这可以通过定义固定类别和硬编码与处理给定类别任务相关的指令来实现。这个过程也可以递归地应用于将任务分解为一系列阶段。这种方法的优点是每个查询仅包含执行任务下一阶段所需的指令,这可能会导致与使用单个查询执行整个任务相比较低的错误率。这也可能会导致更低的成本,因为更大的提示费用更高(请参阅定价信息)。

Suppose for example that for a customer service application, queries could be usefully classified as follows:
例如,假设对于一个客户服务应用程序,查询可以有以下有用的分类:

SYSTEM 系统
You will be provided with customer service queries. Classify each query into a primary category and a secondary category. Provide your output in json format with the keys: primary and secondary. Primary categories: Billing, Technical Support, Account Management, or General Inquiry. Billing secondary categories: - Unsubscribe or upgrade - Add a payment method - Explanation for charge - Dispute a charge Technical Support secondary categories: - Troubleshooting - Device compatibility - Software updates Account Management secondary categories: - Password reset - Update personal information - Close account - Account security General Inquiry secondary categories: - Product information - Pricing - Feedback - Speak to a human
您将获得客户服务查询。将每个查询分类为主要类别和次要类别。以json格式提供输出,键为:primary和secondary。 主要类别:账单、技术支持、账户管理或一般查询。 账单次要分类: 取消订阅或升级 添加支付方式 收费说明 争议一项费用 技术支持的次要分类: 故障排除 设备兼容性 软件更新 账户管理的次级分类: 重置密码 更新个人信息 关闭账户 账户安全 常规查询的次级分类: 产品信息 定价 反馈 请与人类交谈。
USER 把文本翻译成简体中文:
I need to get my internet working again.
我需要让我的互联网重新工作起来。
Open in Playground 在游乐场开放

Based on the classification of the customer query, a set of more specific instructions can be provided to a GPT model to handle next steps. For example, suppose the customer requires help with "troubleshooting".
根据客户查询的分类,可以为GPT模型提供一组更具体的指令来处理下一步。例如,假设客户需要“故障排除”帮助。

SYSTEM 系统
You will be provided with customer service inquiries that require troubleshooting in a technical support context. Help the user by: - Ask them to check that all cables to/from the router are connected. Note that it is common for cables to come loose over time. - If all cables are connected and the issue persists, ask them which router model they are using - Now you will advise them how to restart their device: -- If the model number is MTD-327J, advise them to push the red button and hold it for 5 seconds, then wait 5 minutes before testing the connection. -- If the model number is MTD-327S, advise them to unplug and replug it, then wait 5 minutes before testing the connection. - If the customer's issue persists after restarting the device and waiting 5 minutes, connect them to IT support by outputting {"IT support requested"}. - If the user starts asking questions that are unrelated to this topic then confirm if they would like to end the current chat about troubleshooting and classify their request according to the following scheme: <insert primary/secondary classification scheme from above here>
您将收到需要在技术支持环境中进行故障排除的客户服务查询。通过以下方式帮助用户: 请让他们检查路由器的所有电缆连接情况。请注意,电缆随着时间的推移可能会松动。 如果所有电缆都已连接,问题仍然存在,请询问他们使用的路由器型号。 现在您将告诉他们如何重新启动他们的设备: -- 如果型号为MTD-327J,请建议他们按下红色按钮并保持5秒钟,然后等待5分钟再测试连接。 -- 如果型号为MTD-327S,请建议他们拔掉插头并重新插上,然后等待5分钟再测试连接。 如果客户在重启设备并等待5分钟后仍然存在问题,请输出{"请求IT支持"},将他们连接到IT支持。 如果用户开始提出与此主题无关的问题,请确认他们是否想结束有关故障排除的当前聊天,并根据以下方案对其请求进行分类: 请在此处插入上述的主/次分类方案。
USER 把文本翻译成简体中文:
I need to get my internet working again.
我需要让我的互联网重新工作起来。
Open in Playground 在游乐场开放

Notice that the model has been instructed to emit special strings to indicate when the state of the conversation changes. This enables us to turn our system into a state machine where the state determines which instructions are injected. By keeping track of state, what instructions are relevant at that state, and also optionally what state transitions are allowed from that state, we can put guardrails around the user experience that would be hard to achieve with a less structured approach.
请注意,该模型已被指示发出特殊字符串以指示对话状态何时发生变化。这使我们能够将我们的系统转变为状态机,其中状态确定注入哪些指令。通过跟踪状态,确定在该状态下哪些指令是相关的,以及可选地从该状态允许哪些状态转换,我们可以在用户体验周围设置防护栏,这是通过较少结构化的方法难以实现的。

Since GPTs have a fixed context length, dialogue between a user and an assistant in which the entire conversation is included in the context window cannot continue indefinitely.
由于GPT模型具有固定的上下文长度,因此在用户和助手之间的对话中,如果整个对话都包含在上下文窗口中,那么对话将无法无限制地继续下去。

There are various workarounds to this problem, one of which is to summarize previous turns in the conversation. Once the size of the input reaches a predetermined threshold length, this could trigger a query that summarizes part of the conversation and the summary of the prior conversation could be included as part of the system message. Alternatively, prior conversation could be summarized asynchronously in the background throughout the entire conversation.
有多种解决这个问题的方法,其中之一是总结先前对话中的内容。一旦输入的长度达到预定的阈值,这可能会触发一个查询,总结部分对话内容,之前对话的摘要可以作为系统消息的一部分包含在内。或者,之前的对话可以在整个对话期间异步地在后台进行总结。

An alternative solution is to dynamically select previous parts of the conversation that are most relevant to the current query. See the tactic "Use embeddings-based search to implement efficient knowledge retrieval".
另一种解决方案是动态选择与当前查询最相关的先前对话部分。请参阅策略“使用基于嵌入的搜索实现高效的知识检索”。

Since GPTs have a fixed context length, they cannot be used to summarize a text longer than the context length minus the length of the generated summary in a single query.
由于GPT模型具有固定的上下文长度,因此在单个查询中,它们无法用于总结超过上下文长度减去生成摘要长度的文本。

To summarize a very long document such as a book we can use a sequence of queries to summarize each section of the document. Section summaries can be concatenated and summarized producing summaries of summaries. This process can proceed recursively until an entire document is summarized. If it’s necessary to use information about earlier sections in order to make sense of later sections, then a further trick that can be useful is to include a running summary of the text that precedes any given point in the book while summarizing content at that point. The effectiveness of this procedure for summarizing books has been studied in previous research by OpenAI using variants of GPT-3.
为了总结一本非常长的文档,比如一本书,我们可以使用一系列的查询来总结文档的每个部分。将各个部分的摘要连接起来,再进行总结,就可以得到摘要的摘要。这个过程可以递归进行,直到整个文档被总结。如果需要使用早期部分的信息来理解后面的部分,那么一个有用的技巧是在总结当前内容的同时,包括在书中给定点之前的文本的运行摘要。OpenAI曾经使用GPT-3的变体进行过研究,以评估这种总结书籍的方法的有效性。

Sometimes we get better results when we explicitly instruct the model to reason from first principles before coming to a conclusion. Suppose for example we want a model to evaluate a student’s solution to a math problem. The most obvious way to approach this is to simply ask the model if the student's solution is correct or not.
有时候,我们明确地指示模型在得出结论之前从第一原理出发进行推理,会得到更好的结果。例如,假设我们想让一个模型评估学生解决数学问题的解决方案。最明显的方法是简单地询问模型学生的解决方案是否正确。

SYSTEM 系统
Determine if the student's solution is correct or not.
判断学生的解答是否正确。
USER 把文本翻译成简体中文:
Problem Statement: I'm building a solar power installation and I need help working out the financials. - Land costs $100 / square foot - I can buy solar panels for $250 / square foot - I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot What is the total cost for the first year of operations as a function of the number of square feet. Student's Solution: Let x be the size of the installation in square feet. 1. Land cost: 100x 2. Solar panel cost: 250x 3. Maintenance cost: 100,000 + 100x Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
问题陈述:我正在建造一个太阳能发电装置,我需要帮助计算财务。 土地成本为每平方英尺100美元。 我可以以每平方英尺250美元的价格购买太阳能电池板。 我为维护谈判了一份合同,每年的费用是固定的10万美元,另外还需要支付每平方英尺10美元的费用。 作为面积数量的函数,第一年运营的总成本是多少? 学生的解决方案:令x为安装面积的平方英尺。 1. 土地成本:100x 2. 太阳能电池板成本:250x 3. 维护成本:100,000 + 100x 总成本:100x + 250x + 100,000 + 100x = 450x + 100,000。
ASSISTANT 助手
The student's solution is correct.
学生的解答是正确的。
Open in Playground 在游乐场开放

But the student's solution is actually not correct! We can get the model to successfully notice this by prompting the model to generate its own solution first.
但是学生的解决方案实际上是不正确的!我们可以通过提示模型首先生成自己的解决方案来成功地让模型注意到这一点。

SYSTEM 系统
First work out your own solution to the problem. Then compare your solution to the student's solution and evaluate if the student's solution is correct or not. Don't decide if the student's solution is correct until you have done the problem yourself.
首先,自己先解决问题。然后再将自己的解决方案与学生的解决方案进行比较,并评估学生的解决方案是否正确。在自己解决问题之前,不要决定学生的解决方案是否正确。
USER 把文本翻译成简体中文:
Problem Statement: I'm building a solar power installation and I need help working out the financials. - Land costs $100 / square foot - I can buy solar panels for $250 / square foot - I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot What is the total cost for the first year of operations as a function of the number of square feet. Student's Solution: Let x be the size of the installation in square feet. 1. Land cost: 100x 2. Solar panel cost: 250x 3. Maintenance cost: 100,000 + 100x Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
问题陈述:我正在建造一个太阳能发电装置,我需要帮助计算财务。 土地成本为每平方英尺100美元。 我可以以每平方英尺250美元的价格购买太阳能电池板。 我为维护谈判了一份合同,每年的费用是固定的10万美元,另外还需要支付每平方英尺10美元的费用。 作为面积数量的函数,第一年运营的总成本是多少? 学生的解决方案:令x为安装面积的平方英尺。 1. 土地成本:100x 2. 太阳能电池板成本:250x 3. 维护成本:100,000 + 100x 总成本:100x + 250x + 100,000 + 100x = 450x + 100,000。
ASSISTANT 助手
Let x be the size of the installation in square feet. 1. Land cost: 100x 2. Solar panel cost: 250x 3. Maintenance cost: 100,000 + 10x Total cost: 100x + 250x + 100,000 + 10x = 360x + 100,000 The student's solution is incorrect. They made an error in the maintenance cost calculation, using 100x instead of 10x. The correct total cost for the first year of operations is 360x + 100,000.
设 x 为安装面积的平方英尺。 1. 土地成本:100x 2. 太阳能电池板成本:250x 3. 维护成本:100,000 + 10x 总成本:100x + 250x + 100,000 + 10x = 360x + 100,000 学生的解决方案是错误的。他们在维护成本计算中犯了一个错误,使用了100x而不是10x。第一年运营的正确总成本是360x + 100,000。
Open in Playground 在游乐场开放

The previous tactic demonstrates that it is sometimes important for the model to reason in detail about a problem before answering a specific question. For some applications, the reasoning process that a model uses to arrive at a final answer would be inappropriate to share with the user. For example, in tutoring applications we may want to encourage students to work out their own answers, but a model’s reasoning process about the student’s solution could reveal the answer to the student.
前面的策略表明,有时模型在回答具体问题之前,详细地推理问题是非常重要的。对于某些应用程序,模型用于得出最终答案的推理过程可能不适合与用户共享。例如,在辅导应用程序中,我们可能希望鼓励学生自己解决问题,但是模型对学生解决方案的推理过程可能会向学生透露答案。

Inner monologue is a tactic that can be used to mitigate this. The idea of inner monologue is to instruct the model to put parts of the output that are meant to be hidden from the user into a structured format that makes parsing them easy. Then before presenting the output to the user, the output is parsed and only part of the output is made visible.
内心独白是一种可以用来缓解这种情况的策略。内心独白的想法是指示模型将那些用户不应该看到的输出部分放入结构化格式中,以便于解析。然后在向用户呈现输出之前,对输出进行解析,只有部分输出被展示给用户。

SYSTEM 系统
Follow these steps to answer the user queries. Step 1 - First work out your own solution to the problem. Don't rely on the student's solution since it may be incorrect. Enclose all your work for this step within triple quotes ("""). Step 2 - Compare your solution to the student's solution and evaluate if the student's solution is correct or not. Enclose all your work for this step within triple quotes ("""). Step 3 - If the student made a mistake, determine what hint you could give the student without giving away the answer. Enclose all your work for this step within triple quotes ("""). Step 4 - If the student made a mistake, provide the hint from the previous step to the student (outside of triple quotes). Instead of writing "Step 4 - ..." write "Hint:".
请按照以下步骤回答用户的查询。 步骤1 - 首先自己解决问题,不要依赖学生的解决方案,因为它可能是不正确的。在这一步中,将所有工作都放在三重引号(""")中。 步骤2 - 将您的解决方案与学生的解决方案进行比较,并评估学生的解决方案是否正确。请在此步骤中使用三重引号(""")将所有工作包含在内。 第三步 - 如果学生犯了错误,请确定您可以给学生什么提示,而不会泄露答案。请将此步骤的所有工作都放在三重引号(""")中。 第四步 - 如果学生犯了错误,请向学生提供上一步的提示(在三重引号之外)。不要写“第四步 - ...”,而是写“提示:”。
USER 把文本翻译成简体中文:
Problem Statement: <insert problem statement> Student Solution: <insert student solution>
问题陈述: <插入问题陈述> 学生解决方案: <插入学生解决方案>
Open in Playground 在游乐场开放

Alternatively, this can be achieved with a sequence of queries in which all except the last have their output hidden from the end user.
或者,可以通过一系列查询来实现,其中除了最后一个查询以外,所有查询的输出都对最终用户隐藏。

First, we can ask the model to solve the problem on its own. Since this initial query doesn't require the student’s solution, it can be omitted. This provides the additional advantage that there is no chance that the model’s solution will be biased by the student’s attempted solution.
首先,我们可以要求模型自行解决问题。由于这个初始查询不需要学生的解决方案,因此可以省略。这提供了额外的优势,即模型的解决方案不会受到学生尝试解决方案的影响。

USER 把文本翻译成简体中文:
<insert problem statement>
请提供问题陈述。
Open in Playground 在游乐场开放

Next, we can have the model use all available information to assess the correctness of the student’s solution.
接下来,我们可以让模型利用所有可用信息来评估学生的解决方案的正确性。

SYSTEM 系统
Compare your solution to the student's solution and evaluate if the student's solution is correct or not.
请将您的解决方案与学生的解决方案进行比较,并评估学生的解决方案是否正确。
USER 把文本翻译成简体中文:
Problem statement: """<insert problem statement>""" Your solution: """<insert model generated solution>""" Student’s solution: """<insert student's solution>"""
问题陈述:“<插入问题陈述>” 您的解决方案:"""<插入模型生成的解决方案>""" 学生的解决方案:"""<插入学生的解决方案>"""
Open in Playground 在游乐场开放

Finally, we can let the model use its own analysis to construct a reply in the persona of a helpful tutor.
最终,我们可以让模型使用自己的分析能力,以乐于助人的导师角色构建回复。

SYSTEM 系统
You are a math tutor. If the student made an error, offer a hint to the student in a way that does not reveal the answer. If the student did not make an error, simply offer them an encouraging comment.
你是一名数学导师。如果学生犯了错误,以不透露答案的方式给学生提供提示。如果学生没有犯错误,只需给他们一个鼓励的评论即可。
USER 把文本翻译成简体中文:
Problem statement: """<insert problem statement>""" Your solution: """<insert model generated solution>""" Student’s solution: """<insert student's solution>""" Analysis: """<insert model generated analysis from previous step>"""
问题陈述:"""<插入问题陈述>""" 您的解决方案:"""<插入模型生成的解决方案>""" 学生的解决方案:"""<插入学生的解决方案>""" 分析:"""<插入前一步生成的模型分析>"""
Open in Playground 在游乐场开放

Suppose that we are using a model to list excerpts from a source which are relevant to a particular question. After listing each excerpt the model needs to determine if it should start writing another or if it should stop. If the source document is large, it is common for a model to stop too early and fail to list all relevant excerpts. In that case, better performance can often be obtained by prompting the model with followup queries to find any excerpts it missed on previous passes.
假设我们正在使用一个模型来列出与特定问题相关的源摘录。在列出每个摘录后,模型需要确定它是否应该开始写另一个摘录,还是停止。如果源文档很大,模型通常会过早停止并未能列出所有相关的摘录。在这种情况下,通过提示模型进行后续查询以查找之前漏掉的摘录,通常可以获得更好的性能。

SYSTEM 系统
You will be provided with a document delimited by triple quotes. Your task is to select excerpts which pertain to the following question: "What significant paradigm shifts have occurred in the history of artificial intelligence." Ensure that excerpts contain all relevant context needed to interpret them - in other words don't extract small snippets that are missing important context. Provide output in JSON format as follows: [{"excerpt": "..."}, ... {"excerpt": "..."}]
您将获得一个由三个引号分隔的文档。您的任务是选择与以下问题相关的摘录:“人工智能历史上发生了哪些重大的范式转变。” 请确保摘录包含所有必要的上下文以便解释 - 换句话说,不要提取缺少重要上下文的小片段。按以下JSON格式提供输出: [{"excerpt": "..."}, 请提供需要翻译的文本。 {"摘录": "..."}]
USER 把文本翻译成简体中文:
"""<insert document here>"""
请在此处插入文档。
ASSISTANT 助手
[{"excerpt": "the model writes an excerpt here"}, ... {"excerpt": "the model writes another excerpt here"}]
[{"摘录": "模型在此处撰写摘录"}, 请提供需要翻译的文本。 {"摘录": "模型在此写入另一个摘录"}]
USER 把文本翻译成简体中文:
Are there more relevant excerpts? Take care not to repeat excerpts. Also ensure that excerpts contain all relevant context needed to interpret them - in other words don't extract small snippets that are missing important context.
有更相关的摘录吗?请注意不要重复摘录。同时确保摘录包含解释所需的所有相关上下文 - 换句话说,不要提取缺少重要上下文的小片段。
Open in Playground 在游乐场开放

A model can leverage external sources of information if provided as part of its input. This can help the model to generate more informed and up-to-date responses. For example, if a user asks a question about a specific movie, it may be useful to add high quality information about the movie (e.g. actors, director, etc…) to the model’s input. Embeddings can be used to implement efficient knowledge retrieval, so that relevant information can be added to the model input dynamically at run-time.
如果外部信息作为模型的输入之一提供,模型可以利用这些信息。这可以帮助模型生成更具见解和更新的响应。例如,如果用户询问有关特定电影的问题,将有关该电影的高质量信息(例如演员、导演等)添加到模型的输入中可能会很有用。可以使用嵌入来实现有效的知识检索,以便在运行时动态地将相关信息添加到模型输入中。

A text embedding is a vector that can measure the relatedness between text strings. Similar or relevant strings will be closer together than unrelated strings. This fact, along with the existence of fast vector search algorithms means that embeddings can be used to implement efficient knowledge retrieval. In particular, a text corpus can be split up into chunks, and each chunk can be embedded and stored. Then a given query can be embedded and vector search can be performed to find the embedded chunks of text from the corpus that are most related to the query (i.e. closest together in the embedding space).
文本嵌入是一种向量,可以衡量文本字符串之间的相关性。相似或相关的字符串将比不相关的字符串更接近。这个事实,加上快速向量搜索算法的存在,意味着嵌入可以用于实现高效的知识检索。特别是,文本语料库可以分成块,每个块可以嵌入和存储。然后,可以嵌入给定的查询,并执行向量搜索,以找到与查询最相关的语料库中嵌入的文本块(即在嵌入空间中最接近的文本块)。

Example implementations can be found in the OpenAI Cookbook. See the tactic “Instruct the model to use retrieved knowledge to answer queries” for an example of how to use knowledge retrieval to minimize the likelihood that a model will make up incorrect facts.
在OpenAI Cookbook中可以找到示例实现。请参见策略“指导模型使用检索到的知识来回答查询”的示例,了解如何使用知识检索来最小化模型编造不正确事实的可能性。

GPTs cannot be relied upon to perform arithmetic or long calculations accurately on their own. In cases where this is needed, a model can be instructed to write and run code instead of making its own calculations. In particular, a model can be instructed to put code that is meant to be run into a designated format such as triple backtics. After an output is produced, the code can be extracted and run. Finally, if necessary, the output from the code execution engine (i.e. Python interpreter) can be provided as an input to the model for the next query.
GPTs不能单独准确地执行算术或长时间的计算。在需要这样做的情况下,可以指示模型编写和运行代码,而不是进行自己的计算。特别是,可以指示模型将要运行的代码放入指定的格式,例如三个反引号。生成输出后,可以提取并运行代码。最后,如果必要,代码执行引擎(即Python解释器)的输出可以作为下一个查询的输入提供给模型。

SYSTEM 系统
You can write and execute Python code by enclosing it in triple backticks, e.g. ```code goes here```. Use this to perform calculations.
您可以使用三个反引号将Python代码括起来并执行,例如```代码放在这里```。使用它来进行计算。
USER 把文本翻译成简体中文:
Find all real-valued roots of the following polynomial: 3*x**5 - 5*x**4 - 3*x**3 - 7*x - 10.
找出以下多项式的所有实根:3*x**5 - 5*x**4 - 3*x**3 - 7*x - 10。
Open in Playground 在游乐场开放

Another good use case for code execution is calling external APIs. If a model is instructed in the proper use of an API, it can write code that makes use of it. A model can be instructed in how to use an API by providing it with documentation and/or code samples showing how to use the API.
代码执行的另一个好用例是调用外部API。如果模型被指导正确使用API,它可以编写利用它的代码。可以通过提供文档和/或代码示例来指导模型如何使用API。

SYSTEM 系统
You can write and execute Python code by enclosing it in triple backticks. Also note that you have access to the following module to help users send messages to their friends: ```python import message message.write(to="John", message="Hey, want to meetup after work?")```
您可以使用三个反引号将Python代码括起来并执行。此外,请注意您可以访问以下模块,以帮助用户向他们的朋友发送消息: 把文本翻译成简体中文: ```python 导入消息 消息写入(to="John", message="嘿,下班后想见面吗?")
Open in Playground 在游乐场开放

WARNING: Executing code produced by a model is not inherently safe and precautions should be taken in any application that seeks to do this. In particular, a sandboxed code execution environment is needed to limit the harm that untrusted code could cause.
警告:执行由模型生成的代码本质上并不安全,因此在任何试图执行此操作的应用程序中都应采取预防措施。特别是需要一个沙盒式代码执行环境来限制不受信任的代码可能造成的危害。

Sometimes it can be hard to tell whether a change — e.g., a new instruction or a new design — makes your system better or worse. Looking at a few examples may hint at which is better, but with small sample sizes it can be hard to distinguish between a true improvement or random luck. Maybe the change helps performance on some inputs, but hurts performance on others.
有时候很难判断一个变化——比如新的指令或新的设计——是让你的系统变得更好还是更糟。看一些例子可能会暗示哪个更好,但是在样本数量很小的情况下,很难区分真正的改进还是随机的运气。也许这个变化会提高某些输入的性能,但会降低其他输入的性能。

Evaluation procedures (or "evals") are useful for optimizing system designs. Good evals are:
评估程序(或“评估”)对于优化系统设计非常有用。好的评估应该具备以下特点:

  • Representative of real-world usage (or at least diverse)
    现实世界使用的代表(或至少是多样化的)
  • Contain many test cases for greater statistical power (see table below for guidelines)
    包含许多测试用例以提高统计功效(请参见下表以获取指南)。
  • Easy to automate or repeat
    易于自动化或重复。
Difference to detect 检测的差异Sample size needed for 95% confidence
95%置信度所需的样本量
30%~10
10%~100
3%~1,000
1%~10,000

Evaluation of outputs can be done by computers, humans, or a mix. Computers can automate evals with objective criteria (e.g., questions with single correct answers) as well as some subjective or fuzzy criteria, in which model outputs are evaluated by other model queries. OpenAI Evals is an open-source software framework that provides tools for creating automated evals.
输出结果的评估可以由计算机、人类或二者混合进行。计算机可以使用客观标准(例如,具有单一正确答案的问题)自动化评估,以及一些主观或模糊标准,在这些标准中,模型输出由其他模型查询进行评估。OpenAI Evals是一个开源软件框架,提供创建自动化评估工具的工具。

Model-based evals can be useful when there exists a range of possible outputs that would be considered equally high in quality (e.g. for questions with long answers). The boundary between what can be realistically evaluated with a model-based eval and what requires a human to evaluate is fuzzy and is constantly shifting as models become more capable. We encourage experimentation to figure out how well model-based evals can work for your use case.
当存在一系列可能被认为具有同等高质量的输出时(例如对于需要长答案的问题),基于模型的评估可以是有用的。基于模型的评估能够实际评估的范围与需要人工评估的范围之间的界限模糊不清,并且随着模型变得更加强大,这种界限不断变化。我们鼓励进行实验,以确定基于模型的评估在您的用例中能够发挥多大的作用。

Suppose it is known that the correct answer to a question should make reference to a specific set of known facts. Then we can use a model query to count how many of the required facts are included in the answer.
假设已知一个问题的正确答案应该涉及到一组特定的已知事实。那么我们可以使用一个模型查询来计算答案中包含了多少必要的事实。

For example, using the following system message:
例如,使用以下系统消息:

SYSTEM 系统
You will be provided with text delimited by triple quotes that is supposed to be the answer to a question. Check if the following pieces of information are directly contained in the answer: - Neil Armstrong was the first person to walk on the moon. - The date Neil Armstrong first walked on the moon was July 21, 1969. For each of these points perform the following steps: 1 - Restate the point. 2 - Provide a citation from the answer which is closest to this point. 3 - Consider if someone reading the citation who doesn't know the topic could directly infer the point. Explain why or why not before making up your mind. 4 - Write "yes" if the answer to 3 was yes, otherwise write "no". Finally, provide a count of how many "yes" answers there are. Provide this count as {"count": <insert count here>}.
您将获得由三个引号分隔的文本,该文本应该是问题的答案。请检查以下信息是否直接包含在答案中: 尼尔·阿姆斯特朗是第一位在月球上行走的人。 尼尔·阿姆斯特朗第一次登上月球的日期是1969年7月21日。 对于这些点,请执行以下步骤: 1 - 重申这一点。 2 - 提供一段最接近此观点的答案引用。 3 - 考虑一下,如果一个不了解这个主题的人阅读引用,是否能够直接推断出要点。在做出决定之前,请解释原因。 4 - 如果问题3的答案是“是”,请写“是”,否则请写“否”。 最后,提供一个“是”答案的计数。将此计数表示为{"count": <插入计数>}。

Here's an example input where both points are satisfied:
以下是一个满足两个条件的输入示例:

SYSTEM 系统
<insert system message above>
请将系统消息插入以上。
USER 把文本翻译成简体中文:
"""Neil Armstrong is famous for being the first human to set foot on the Moon. This historic event took place on July 21, 1969, during the Apollo 11 mission."""
尼尔·阿姆斯特朗因成为第一个登上月球的人而闻名。这一历史性事件发生在1969年7月21日,当时他参加了阿波罗11号任务。
Open in Playground 在游乐场开放

Here's an example input where only one point is satisfied:
这里是一个只满足一个点的输入示例:

SYSTEM 系统
<insert system message above>
请将系统消息插入以上。
USER 把文本翻译成简体中文:
"""Neil Armstrong made history when he stepped off the lunar module, becoming the first person to walk on the moon."""
当尼尔·阿姆斯特朗从登月舱走出来,成为第一个在月球上行走的人时,他创造了历史。
Open in Playground 在游乐场开放

Here's an example input where none are satisfied:
以下是一个没有满足条件的输入示例:

SYSTEM 系统
<insert system message above>
请将系统消息插入以上。
USER 把文本翻译成简体中文:
"""In the summer of '69, a voyage grand, Apollo 11, bold as legend's hand. Armstrong took a step, history unfurled, "One small step," he said, for a new world."""
在1969年的夏天,一次宏伟的航行, 阿波罗11号,英勇无畏,如传说中的巨手。 阿姆斯特朗迈出了一步,历史展开了。 “这是迈向新世界的一小步。”他说道。
Open in Playground 在游乐场开放

There are many possible variants on this type of model-based eval. Consider the following variation which tracks the kind of overlap between the candidate answer and the gold-standard answer, and also tracks whether the candidate answer contradicts any part of the gold-standard answer.
这种基于模型的评估方法有许多可能的变体。考虑以下变体,它跟踪候选答案与黄金标准答案之间的重叠程度,并跟踪候选答案是否与黄金标准答案的任何部分相矛盾。

SYSTEM 系统
Use the following steps to respond to user inputs. Fully restate each step before proceeding. i.e. "Step 1: Reason...". Step 1: Reason step-by-step about whether the information in the submitted answer compared to the expert answer is either: disjoint, equal, a subset, a superset, or overlapping (i.e. some intersection but not subset/superset). Step 2: Reason step-by-step about whether the submitted answer contradicts any aspect of the expert answer. Step 3: Output a JSON object structured like: {"type_of_overlap": "disjoint" or "equal" or "subset" or "superset" or "overlapping", "contradiction": true or false}
请按照以下步骤响应用户输入。在继续之前,请完整重述每个步骤。例如:“步骤1:原因...”。 步骤1:逐步推理提交答案中的信息与专家答案相比是不是:不相交、相等、子集、超集或重叠(即存在交集但不是子集/超集)。 步骤2:逐步推理提交的答案是否与专家答案的任何方面相矛盾。 步骤三:输出一个JSON对象,结构如下:{"type_of_overlap": "disjoint" 或 "equal" 或 "subset" 或 "superset" 或 "overlapping", "contradiction": true 或 false}。

Here's an example input with a substandard answer which nonetheless does not contradict the expert answer:
这是一个示例输入,其中包含一个次标准的答案,但仍不与专家答案相矛盾:

SYSTEM 系统
<insert system message above>
请将系统消息插入以上。
USER 把文本翻译成简体中文:
Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time.""" Submitted Answer: """Didn't he walk on the moon or something?""" Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""
问题:“尼尔·阿姆斯特朗最著名的事件是什么,它发生在哪个日期?假设使用协调世界时。” 他不是走在月球上了吗? 专家回答:“尼尔·阿姆斯特朗最著名的是成为第一个在月球上行走的人。这一历史性事件发生在1969年7月21日。”
Open in Playground 在游乐场开放

Here's an example input with answer that directly contradicts the expert answer:
这是一个输入示例,其答案直接与专家答案相矛盾:

SYSTEM 系统
<insert system message above>
请将系统消息插入以上。
USER 把文本翻译成简体中文:
Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time.""" Submitted Answer: """On the 21st of July 1969, Neil Armstrong became the second person to walk on the moon, following after Buzz Aldrin.""" Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""
问题:“尼尔·阿姆斯特朗最著名的事件是什么,它发生在哪个日期?假设使用协调世界时。” 1969年7月21日,尼尔·阿姆斯特朗成为第二个登上月球的人,紧随巴兹·奥尔德林之后。 专家回答:“尼尔·阿姆斯特朗最著名的是成为第一个在月球上行走的人。这一历史性事件发生在1969年7月21日。”
Open in Playground 在游乐场开放

Here's an example input with a correct answer that also provides a bit more detail than is necessary:
这是一个示例输入,正确答案还提供了比必要的更多细节:

SYSTEM 系统
<insert system message above>
请将系统消息插入以上。
USER 把文本翻译成简体中文:
Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time.""" Submitted Answer: """At approximately 02:56 UTC on July 21st 1969, Neil Armstrong became the first human to set foot on the lunar surface, marking a monumental achievement in human history.""" Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""
问题:“尼尔·阿姆斯特朗最著名的事件是什么,它发生在哪个日期?假设使用协调世界时。” 1969年7月21日UTC时间02:56左右,尼尔·阿姆斯特朗成为第一个踏上月球表面的人类,标志着人类历史上的一个重大成就。 专家回答:“尼尔·阿姆斯特朗最著名的是成为第一个在月球上行走的人。这一历史性事件发生在1969年7月21日。”
Open in Playground 在游乐场开放

For more inspiration, visit the OpenAI Cookbook, which contains example code and also links to third-party resources such as:
为了获得更多灵感,请访问OpenAI Cookbook,其中包含示例代码以及链接到第三方资源,例如: